Goto

Collaborating Authors

 South Atlantic Ocean


Complex-Valued 2D Gaussian Representation for Computer-Generated Holography

Zhan, Yicheng, Gao, Xiangjun, Quan, Long, Akşit, Kaan

arXiv.org Artificial Intelligence

W e propose a new hologram representation based on structured complex-valued 2D Gaussian primitives, which replaces per-pixel information storage and reduces the parameter search space by up to 10:1. T o enable end-to-end training, we develop a differentiable rasterizer for our representation, integrated with a GPU-optimized light propagation kernel in free space. Our extensive experiments show that our method achieves up to 2.5 lower VRAM usage and 50% faster optimization while producing higher-fidelity reconstructions than existing methods. W e further introduce a conversion procedure that adapts our representation to practical hologram formats, including smooth and random phase-only holograms. Our experiments show that this procedure can effectively suppress noise artifacts observed in previous methods. By reducing the hologram parameter search space, our representation enables a more scalable hologram estimation in the next-generation computer-generated holography systems.


MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

Chen, Hui, Xiong, Miao, Lu, Yujie, Han, Wei, Deng, Ailin, He, Yufei, Wu, Jiaying, Li, Yibo, Liu, Yue, Hooi, Bryan

arXiv.org Artificial Intelligence

Recent advancements in AI agents have demonstrated their growing potential to drive and support scientific discovery. In this work, we introduce MLR-Bench, a comprehensive benchmark for evaluating AI agents on open-ended machine learning research. MLR-Bench includes three key components: (1) 201 research tasks sourced from NeurIPS, ICLR, and ICML workshops covering diverse ML topics; (2) MLR-Judge, an automated evaluation framework combining LLM-based reviewers with carefully designed review rubrics to assess research quality; and (3) MLR-Agent, a modular agent scaffold capable of completing research tasks through four stages: idea generation, proposal formulation, experimentation, and paper writing. Our framework supports both stepwise assessment across these distinct research stages, and end-to-end evaluation of the final research paper. We then use MLR-Bench to evaluate six frontier LLMs and an advanced coding agent, finding that while LLMs are effective at generating coherent ideas and well-structured papers, current coding agents frequently (e.g., in 80% of the cases) produce fabricated or invalidated experimental results--posing a major barrier to scientific reliability. We validate MLR-Judge through human evaluation, showing high agreement with expert reviewers, supporting its potential as a scalable tool for research evaluation. We open-source MLR-Bench to help the community benchmark, diagnose, and improve AI research agents toward trustworthy and transparent scientific discovery.



DB-TSDF: Directional Bitmask-based Truncated Signed Distance Fields for Efficient Volumetric Mapping

Maese, Jose E., Merino, Luis, Caballero, Fernando

arXiv.org Artificial Intelligence

Abstract-- This paper presents a high-efficiency, CPU-only volumetric mapping framework based on a Truncated Signed Distance Field (TSDF). A key feature of the approach is that the processing time per point-cloud remains constant, regardless of the voxel grid resolution, enabling high resolution mapping without sacrificing runtime performance. In contrast to most recent TSDF/ESDF methods that rely on GPU acceleration, our method operates entirely on CPU, achieving competitive results in speed. Experiments on real-world open datasets demonstrate that the generated maps attain accuracy on par with contemporary mapping techniques. V olumetric mapping is a fundamental capability in mobile robotics, supporting tasks such as collision avoidance, motion planning, and the construction of consistent world models under real-time constraints. Point clouds and occupancy grids remain widely used on CPU-only platforms, as their simple data structures allow efficient processing without specialized hardware. However, they are prone to aliasing at high resolutions and often produce geometric artifacts that hinder downstream processing.


Explainable AI in Deep Learning-Based Prediction of Solar Storms

Rawashdeh, Adam O., Wang, Jason T. L., Herbert, Katherine G.

arXiv.org Artificial Intelligence

A deep learning model is often considered a black-box model, as its internal workings tend to be opaque to the user. Because of the lack of transparency, it is challenging to understand the reasoning behind the model's predictions. Here, we present an approach to making a deep learning-based solar storm prediction model interpretable, where solar storms include solar flares and coronal mass ejections (CMEs). This deep learning model, built based on a long short-term memory (LSTM) network with an attention mechanism, aims to predict whether an active region (AR) on the Sun's surface that produces a flare within 24 hours will also produce a CME associated with the flare. The crux of our approach is to model data samples in an AR as time series and use the LSTM network to capture the temporal dynamics of the data samples. To make the model's predictions accountable and reliable, we leverage post hoc model-agnostic techniques, which help elucidate the factors contributing to the predicted output for an input sequence and provide insights into the model's behavior across multiple sequences within an AR. To our knowledge, this is the first time that interpretability has been added to an LSTM-based solar storm prediction model.


Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker

Saxena, Rachna, Kumar, Abhijeet, Shanmugam, Suresh

arXiv.org Artificial Intelligence

Traditional information extraction systems face challenges with text only language models as it does not consider infographics (visual elements of information) such as tables, charts, images etc. often used to convey complex information to readers. Multimodal LLM (MLLM) face challenges of finding needle in the haystack problem i.e., either longer context length or substantial number of documents as search space. Late interaction mechanism over visual language models has shown state of the art performance in retrieval-based vision augmented Q&A tasks. There are yet few challenges using it for RAG based multi-modal Q&A. Firstly, many popular and widely adopted vector databases do not support native multi-vector retrieval. Secondly, late interaction requires computation which inflates space footprint and can hinder enterprise adoption. Lastly, the current state of late interaction mechanism does not leverage the approximate neighbor search indexing methods for large speed ups in retrieval process. This paper explores a pragmatic approach to make vision retrieval process scalable and efficient without compromising on performance quality. We propose multi-step custom implementation utilizing widely adopted hybrid search (metadata & embedding) and state of the art late interaction re-ranker to retrieve best matching pages. Finally, MLLM are prompted as reader to generate answers from contextualized best matching pages. Through experiments, we observe that the proposed design is scalable (significant speed up) and stable (without degrading performance quality), hence can be used as production systems at enterprises.


Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review

Shivashankar, Karthik, Hajj, Ghadi S. Al, Martini, Antonio

arXiv.org Artificial Intelligence

This systematic literature review examines the critical challenges and solutions related to scalability and maintainability in Machine Learning (ML) systems. As ML applications become increasingly complex and widespread across industries, the need to balance system scalability with long-term maintainability has emerged as a significant concern. This review synthesizes current research and practices addressing these dual challenges across the entire ML life-cycle, from data engineering to model deployment in production. We analyzed 124 papers to identify and categorize 41 maintainability challenges and 13 scalability challenges, along with their corresponding solutions. Our findings reveal intricate inter dependencies between scalability and maintainability, where improvements in one often impact the other. The review is structured around six primary research questions, examining maintainability and scalability challenges in data engineering, model engineering, and ML system development. We explore how these challenges manifest differently across various stages of the ML life-cycle. This comprehensive overview offers valuable insights for both researchers and practitioners in the field of ML systems. It aims to guide future research directions, inform best practices, and contribute to the development of more robust, efficient, and sustainable ML applications across various domains.


Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts

Field, Severin

arXiv.org Artificial Intelligence

The development of artificial general intelligence (AGI) is likely to be one of humanity's most consequential technological advancements. Leading AI labs and scientists have called for the global prioritization of AI safety citing existential risks comparable to nuclear war. However, research on catastrophic risks and AI alignment is often met with skepticism, even by experts. Furthermore, online debate over the existential risk of AI has begun to turn tribal (e.g. name-calling such as "doomer" or "accelerationist"). Until now, no systematic study has explored the patterns of belief and the levels of familiarity with AI safety concepts among experts. I surveyed 111 AI experts on their familiarity with AI safety concepts, key objections to AI safety, and reactions to safety arguments. My findings reveal that AI experts cluster into two viewpoints -- an "AI as controllable tool" and an "AI as uncontrollable agent" perspective -- diverging in beliefs toward the importance of AI safety. While most experts (78%) agreed or strongly agreed that "technical AI researchers should be concerned about catastrophic risks", many were unfamiliar with specific AI safety concepts. For example, only 21% of surveyed experts had heard of "instrumental convergence," a fundamental concept in AI safety predicting that advanced AI systems will tend to pursue common sub-goals (such as self-preservation). The least concerned participants were the least familiar with concepts like this, suggesting that effective communication of AI safety should begin with establishing clear conceptual foundations in the field.


Regressing Transformers for Data-efficient Visual Place Recognition

Leyva-Vallina, María, Strisciuglio, Nicola, Petkov, Nicolai

arXiv.org Artificial Intelligence

Visual place recognition is a critical task in computer vision, especially for localization and navigation systems. Existing methods often rely on contrastive learning: image descriptors are trained to have small distance for similar images and larger distance for dissimilar ones in a latent space. However, this approach struggles to ensure accurate distance-based image similarity representation, particularly when training with binary pairwise labels, and complex re-ranking strategies are required. This work introduces a fresh perspective by framing place recognition as a regression problem, using camera field-of-view overlap as similarity ground truth for learning. By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking, offering data-efficient training and strong generalization across several benchmark datasets.


On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

Li, Chenyang, Chung, Jihoon, Cai, Biao, Wang, Haimin, Zhou, Xianlian, Shen, Bo

arXiv.org Machine Learning

Model compression is a crucial part of deploying neural networks (NNs), especially when the memory and storage of computing devices are limited in many applications. This paper focuses on two model compression techniques: low-rank approximation and weight pruning in neural networks, which are very popular nowadays. However, training NN with low-rank approximation and weight pruning always suffers significant accuracy loss and convergence issues. In this paper, a holistic framework is proposed for model compression from a novel perspective of nonconvex optimization by designing an appropriate objective function. Then, we introduce NN-BCD, a block coordinate descent (BCD) algorithm to solve the nonconvex optimization. One advantage of our algorithm is that an efficient iteration scheme can be derived with closed-form, which is gradient-free. Therefore, our algorithm will not suffer from vanishing/exploding gradient problems. Furthermore, with the Kurdyka-{\L}ojasiewicz (K{\L}) property of our objective function, we show that our algorithm globally converges to a critical point at the rate of O(1/k), where k denotes the number of iterations. Lastly, extensive experiments with tensor train decomposition and weight pruning demonstrate the efficiency and superior performance of the proposed framework. Our code implementation is available at https://github.com/ChenyangLi-97/NN-BCD